Exponential Distribution (expon)#

The exponential distribution models waiting times between events when events happen at a constant average rate and the process has no memory.

It’s a cornerstone of applied probability because it links directly to the Poisson process, appears as a special case of the Gamma distribution, and is the continuous analogue of the geometric distribution.

What you’ll learn#

  • what expon models and when it’s appropriate

  • the PDF/CDF/survival/hazard functions (with LaTeX)

  • key moments, MGF/characteristic function, and entropy

  • how parameters change the shape (rate vs scale)

  • core derivations: expectation, variance, likelihood + MLE

  • NumPy-only sampling via inverse transform + Monte Carlo validation

  • SciPy usage: scipy.stats.expon (pdf, cdf, rvs, fit)

  • hypothesis testing, Bayesian updates, and generative modeling patterns

Notebook roadmap#

  1. Title & classification

  2. Intuition & motivation

  3. Formal definition (PDF/CDF)

  4. Moments & properties

  5. Parameter interpretation

  6. Derivations ((\mathbb{E}[X]), (\mathrm{Var}(X)), likelihood)

  7. Sampling & simulation (NumPy-only)

  8. Visualization (PDF, CDF, Monte Carlo)

  9. SciPy integration (scipy.stats.expon)

  10. Statistical use cases

  11. Pitfalls

  12. Summary

import math

import numpy as np
import scipy
from scipy import stats

import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

SEED = 7
rng = np.random.default_rng(SEED)

np.set_printoptions(precision=4, suppress=True)

print("numpy ", np.__version__)
print("scipy ", scipy.__version__)
print("plotly", plotly.__version__)
numpy  1.26.2
scipy  1.15.0
plotly 6.5.2

Prerequisites & notation#

Prerequisites

  • comfort with basic calculus (integration by parts)

  • basic probability (PDF/CDF, expectation)

Notation

There are two common parameterizations:

  • Rate (\lambda > 0): average number of events per unit time.

  • Scale (\theta > 0): mean waiting time, with (\theta = 1/\lambda).

In this notebook, we primarily use the rate (\lambda). When we use SciPy, we map to its parameterization:

  • scipy.stats.expon(loc=0, scale=θ) where (\theta = 1/\lambda).

1) Title & classification#

  • Name: expon (Exponential distribution)

  • Type: continuous

  • Support: (x \in [0, \infty))

  • Parameter space:

    • rate form: (\lambda \in (0, \infty))

    • scale form: (\theta \in (0, \infty)), where (\theta = 1/\lambda)

A shifted form sometimes appears with a location parameter (\text{loc}): (x \ge \text{loc}). SciPy exposes this as loc.

2) Intuition & motivation#

What it models#

The exponential distribution is the canonical model for a waiting time until the next event when:

  1. events arrive at a constant average rate, and

  2. the process is memoryless.

Typical real-world use cases#

  • Queueing / arrivals: time between customer arrivals (Poisson process assumption)

  • Reliability: time-to-failure under a constant hazard rate

  • Survival analysis: baseline model before introducing covariates or non-constant hazards

  • Networking: simplified models of packet arrival gaps

Key relations to other distributions#

  • Poisson process: if events arrive as a Poisson process with rate (\lambda), inter-arrival times are i.i.d. (\mathrm{Exp}(\lambda))

  • Gamma: (\mathrm{Exp}(\lambda)) is (\mathrm{Gamma}(k=1,,\lambda)) (shape 1)

  • Geometric (discrete analogue): counts trials until first success; exponential is “continuous waiting time” analogue

  • Min of exponentials: if (X_i \sim \mathrm{Exp}(\lambda_i)) independent, then (\min_i X_i \sim \mathrm{Exp}(\sum_i \lambda_i))

Memorylessness (core intuition)#

For (X \sim \mathrm{Exp}(\lambda)), for (s,t \ge 0):

[ \mathbb{P}(X > s+t \mid X > s) = \mathbb{P}(X > t). ]

This is why the exponential distribution is often used as the “default” waiting-time model.

A one-line proof uses the survival function (S(x)=\mathbb{P}(X>x)=e^{-\lambda x}):

[ \mathbb{P}(X>s+t\mid X>s)= rac{S(s+t)}{S(s)}= rac{e^{-\lambda(s+t)}}{e^{-\lambda s}}=e^{-\lambda t}=\mathbb{P}(X>t). ]

3) Formal definition#

Let (X \sim \mathrm{Exp}(\lambda)) with (\lambda>0).

PDF#

[ f(x\mid\lambda) = \lambda e^{-\lambda x}, \quad x \ge 0. ]

(And (f(x\mid\lambda)=0) for (x<0).)

CDF#

[ F(x\mid\lambda) = \mathbb{P}(X \le x) = 1 - e^{-\lambda x}, \quad x \ge 0. ]

Survival function#

[ S(x\mid\lambda) = \mathbb{P}(X > x) = e^{-\lambda x}. ]

Hazard rate#

[ h(x) = \frac{f(x)}{S(x)} = \lambda. ]

The hazard is constant, which is a strong modeling assumption.

4) Moments & properties#

Moments#

For (X \sim \mathrm{Exp}(\lambda)):

  • Mean: (\mathbb{E}[X] = \tfrac{1}{\lambda})

  • Variance: (\mathrm{Var}(X) = \tfrac{1}{\lambda^2})

  • Skewness: (2)

  • (Excess) kurtosis: (6) (so kurtosis (= 9))

  • Median: (\tfrac{\ln 2}{\lambda})

MGF and characteristic function#

  • MGF (for (t < \lambda)): [ M_X(t) = \mathbb{E}[e^{tX}] = \frac{\lambda}{\lambda - t}. ]

  • Characteristic function: [ \varphi_X(t) = \mathbb{E}[e^{itX}] = \frac{\lambda}{\lambda - it}. ]

Entropy (differential, in nats)#

[ H(X) = 1 - \ln \lambda. ]

Other notable properties#

  • Mode: (0)

  • Memoryless: (\mathbb{P}(X>s+t \mid X>s)=\mathbb{P}(X>t))

  • Min stability: (\min_i X_i) is exponential with rate (\sum_i \lambda_i) (independent case)

  • Sums: if (X_1,\dots,X_n) i.i.d. (\mathrm{Exp}(\lambda)), then (\sum_i X_i \sim \mathrm{Gamma}(n,\lambda))

def exp_pdf(x: np.ndarray, rate: float) -> np.ndarray:
    x = np.asarray(x)
    return np.where(x >= 0, rate * np.exp(-rate * x), 0.0)


def exp_cdf(x: np.ndarray, rate: float) -> np.ndarray:
    x = np.asarray(x)
    return np.where(x >= 0, 1.0 - np.exp(-rate * x), 0.0)


def sample_expon_inverse(n: int, rate: float, rng: np.random.Generator) -> np.ndarray:
    '''NumPy-only inverse-CDF sampling for Exp(rate).

    If U ~ Uniform(0, 1), then X = -log(1-U)/rate ~ Exp(rate).
    We use log1p for numerical stability when U is close to 0.
    '''

    if rate <= 0:
        raise ValueError("rate must be > 0")
    u = rng.random(n)
    return -np.log1p(-u) / rate


def exp_loglik(rate: float, x: np.ndarray) -> float:
    x = np.asarray(x)
    if rate <= 0 or np.any(x < 0):
        return -np.inf
    return x.size * math.log(rate) - rate * float(x.sum())


# Quick Monte Carlo check of mean/variance
rate = 1.7
x_mc = sample_expon_inverse(200_000, rate=rate, rng=rng)

x_mc.mean(), x_mc.var(), (1 / rate), (1 / rate**2)
(0.5876066849609238,
 0.34170872750465553,
 0.5882352941176471,
 0.34602076124567477)

5) Parameter interpretation#

Rate (\lambda)#

  • Larger (\lambda) (\Rightarrow) events happen “more frequently” (\Rightarrow) shorter typical waiting times.

  • (\mathbb{E}[X] = 1/\lambda), so (\lambda) is the inverse of the mean waiting time.

Scale (\theta)#

  • (\theta = 1/\lambda)

  • Larger (\theta) (\Rightarrow) waiting times are typically longer.

Shape changes#

  • All exponential PDFs have their mode at 0 and decay exponentially.

  • The decay rate is controlled entirely by (\lambda).

rates = [0.5, 1.0, 2.0]
x = np.linspace(0, 8, 400)

fig = go.Figure()
for r in rates:
    fig.add_trace(
        go.Scatter(
            x=x,
            y=exp_pdf(x, r),
            mode="lines",
            name=f"λ={r:g} (mean={1/r:g})",
        )
    )
    fig.add_vline(x=1 / r, line_dash="dot", opacity=0.35)

fig.update_layout(
    title="Exponential PDF for different rates",
    xaxis_title="x",
    yaxis_title="f(x)",
)
fig.show()

6) Derivations#

We derive (\mathbb{E}[X]), (\mathrm{Var}(X)), and the likelihood/MLE.

Expectation#

[ \mathbb{E}[X] = \int_0^\infty x, \lambda e^{-\lambda x},dx. ]

Integration by parts: let (u=x), (dv=\lambda e^{-\lambda x},dx). Then (du=dx) and (v=-e^{-\lambda x}).

[ \mathbb{E}[X] = \left[-x e^{-\lambda x}\right]_0^\infty + \int_0^\infty e^{-\lambda x},dx = 0 + \left[\frac{-1}{\lambda}e^{-\lambda x}\right]_0^\infty = \frac{1}{\lambda}. ]

Second moment and variance#

Compute (\mathbb{E}[X^2]): [ \mathbb{E}[X^2] = \int_0^\infty x^2, \lambda e^{-\lambda x},dx. ]

Integration by parts again: let (u=x^2), (dv=\lambda e^{-\lambda x},dx). Then (du=2x,dx) and (v=-e^{-\lambda x}).

[ \mathbb{E}[X^2] = \left[-x^2 e^{-\lambda x}\right]_0^\infty + \int_0^\infty 2x, e^{-\lambda x},dx. ]

Now compute (\int_0^\infty x e^{-\lambda x},dx) (integration by parts with (u=x), (dv=e^{-\lambda x}dx), (v=-\tfrac{1}{\lambda}e^{-\lambda x})):

[ \int_0^\infty x e^{-\lambda x},dx = \left[\frac{-x}{\lambda}e^{-\lambda x}\right]_0^\infty + \int_0^\infty \frac{1}{\lambda}e^{-\lambda x},dx = 0 + \frac{1}{\lambda^2}. ]

So: [ \mathbb{E}[X^2] = 2\cdot \frac{1}{\lambda^2} = \frac{2}{\lambda^2}. ]

Therefore: [ \mathrm{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \frac{2}{\lambda^2} - \left(\frac{1}{\lambda}\right)^2 = \frac{1}{\lambda^2}. ]

Likelihood and MLE#

Given i.i.d. samples (x_1,\dots,x_n) with (x_i\ge 0):

[ L(\lambda) = \prod_{i=1}^n \lambda e^{-\lambda x_i} = \lambda^n \exp\left(-\lambda \sum_{i=1}^n x_i\right). ]

Log-likelihood: [ \ell(\lambda) = n\ln\lambda - \lambda \sum_{i=1}^n x_i. ]

Differentiate and set to zero: [ \ell’(\lambda) = \frac{n}{\lambda} - \sum_{i=1}^n x_i = 0 \quad\Rightarrow\quad \hat{\lambda}_{\text{MLE}} = \frac{n}{\sum_i x_i} = \frac{1}{\bar{x}}. ]

Equivalently, in scale form (\hat{\theta} = \bar{x}).

# MLE demo on simulated data
true_rate = 1.25
n = 400
x = sample_expon_inverse(n, rate=true_rate, rng=rng)

rate_mle = n / x.sum()

loglik_true = exp_loglik(true_rate, x)
loglik_mle = exp_loglik(rate_mle, x)

true_rate, rate_mle, loglik_true, loglik_mle
(1.25, 1.2681344724009203, -305.02253233559685, -304.9812395274336)

7) Sampling & simulation (NumPy-only)#

Inverse transform sampling#

If (U \sim \mathrm{Uniform}(0,1)), then using the CDF (F(x)=1-e^{-\lambda x}), we solve (U = F(X)):

[ U = 1 - e^{-\lambda X} \quad\Rightarrow\quad 1-U = e^{-\lambda X} \quad\Rightarrow\quad X = -\frac{\ln(1-U)}{\lambda}. ]

That yields a simple algorithm:

  1. draw (U\in(0,1))

  2. return (X = -\ln(1-U)/\lambda)

Numerical note: use log1p(-U) to avoid catastrophic cancellation when (U) is tiny.

# Sampling: compare histogram to the true PDF
rate = 1.7
n = 50_000
samples = sample_expon_inverse(n, rate=rate, rng=rng)

x_grid = np.linspace(0, np.quantile(samples, 0.995), 400)

fig = px.histogram(
    samples,
    nbins=60,
    histnorm="probability density",
    title=f"Monte Carlo samples vs PDF (n={n}, λ={rate:g})",
    labels={"value": "x"},
)
fig.add_trace(
    go.Scatter(x=x_grid, y=exp_pdf(x_grid, rate), mode="lines", name="true pdf")
)
fig.update_layout(yaxis_title="density")
fig.show()

8) Visualization (PDF, CDF, Monte Carlo)#

We’ll visualize:

  • the PDF for multiple rates

  • the CDF and an empirical CDF from Monte Carlo samples

# PDF and CDF side-by-side for multiple rates
rates = [0.5, 1.0, 2.0]
x = np.linspace(0, 8, 500)

fig_pdf = go.Figure()
fig_cdf = go.Figure()

for r in rates:
    fig_pdf.add_trace(go.Scatter(x=x, y=exp_pdf(x, r), mode="lines", name=f"λ={r:g}"))
    fig_cdf.add_trace(go.Scatter(x=x, y=exp_cdf(x, r), mode="lines", name=f"λ={r:g}"))

fig_pdf.update_layout(title="Exponential PDF", xaxis_title="x", yaxis_title="f(x)")
fig_cdf.update_layout(title="Exponential CDF", xaxis_title="x", yaxis_title="F(x)")

fig_pdf.show()
fig_cdf.show()
# Empirical CDF vs true CDF
rate = 1.7
n = 20_000
samples = sample_expon_inverse(n, rate=rate, rng=rng)

xs = np.sort(samples)
ys = np.arange(1, n + 1) / n

x_grid = np.linspace(0, np.quantile(xs, 0.995), 400)

fig = go.Figure()
fig.add_trace(go.Scatter(x=xs, y=ys, mode="lines", name="empirical CDF"))
fig.add_trace(go.Scatter(x=x_grid, y=exp_cdf(x_grid, rate), mode="lines", name="true CDF"))
fig.update_layout(
    title=f"Empirical CDF vs true CDF (n={n}, λ={rate:g})",
    xaxis_title="x",
    yaxis_title="F(x)",
)
fig.show()

9) SciPy integration (scipy.stats.expon)#

SciPy uses a location-scale parameterization:

  • expon(loc=ℓ, scale=θ) has support (x\ge \ell)

  • for the canonical exponential with rate (\lambda): set (\ell=0), (\theta=1/\lambda)

So if you think in rates, remember: [ \lambda = 1/\text{scale}. ]

from scipy.stats import expon

rate = 1.7
scale = 1 / rate
rv = expon(loc=0.0, scale=scale)

x_grid = np.linspace(0, 6, 400)

pdf_scipy = rv.pdf(x_grid)
cdf_scipy = rv.cdf(x_grid)

# SciPy sampling
samples_scipy = rv.rvs(size=10_000, random_state=rng)

# Fit: for an exponential with loc=0, the MLE for scale is the sample mean.
loc_hat, scale_hat = expon.fit(samples_scipy, floc=0.0)
rate_hat = 1 / scale_hat

(loc_hat, scale_hat, rate_hat)
(0.0, 0.593214983382439, 1.6857295045012564)
# Compare SciPy PDF to our PDF implementation
rate = 1.7
scale = 1 / rate
rv = expon(loc=0.0, scale=scale)

x_grid = np.linspace(0, 6, 400)

max_abs_diff = np.max(np.abs(rv.pdf(x_grid) - exp_pdf(x_grid, rate)))
max_abs_diff
1.1102230246251565e-16

10) Statistical use cases#

A) Hypothesis testing (rate parameter)#

If (X_1,\dots,X_n) are i.i.d. (\mathrm{Exp}(\lambda)), then the sum (S = \sum_i X_i) has a Gamma distribution:

[ S \sim \mathrm{Gamma}(\text{shape}=n,;\text{rate}=\lambda). ]

This gives an exact way to test hypotheses about (\lambda) using (S).

B) Bayesian modeling (Gamma prior on (\lambda))#

A Gamma prior is conjugate:

  • prior: (\lambda \sim \mathrm{Gamma}(\alpha_0,\beta_0)) (rate (\beta_0))

  • likelihood: i.i.d. exponential data

  • posterior: (\lambda \mid x \sim \mathrm{Gamma}(\alpha_0+n,\beta_0+\sum x_i))

C) Generative modeling (Poisson process)#

A Poisson process can be generated by sampling exponential inter-arrival times and cumulatively summing them.

# A) Exact test for H0: λ = λ0 using S = sum(X_i)

true_rate = 1.25
n = 50
x = sample_expon_inverse(n, rate=true_rate, rng=rng)
S = x.sum()

lambda0 = 1.0  # null hypothesis

# Under H0: S ~ Gamma(shape=n, scale=1/lambda0)
S_dist = stats.gamma(a=n, scale=1 / lambda0)

cdf = S_dist.cdf(S)
# Two-sided p-value by tail probability (simple symmetric-tail construction)
p_two_sided = 2 * min(cdf, 1 - cdf)

# Likelihood ratio test (asymptotic chi^2_1)
rate_mle = n / S
lrt = 2 * (exp_loglik(rate_mle, x) - exp_loglik(lambda0, x))
p_lrt = 1 - stats.chi2(df=1).cdf(lrt)

{
    "S": S,
    "rate_mle": rate_mle,
    "p_two_sided_exact": p_two_sided,
    "lrt_stat": lrt,
    "p_lrt_asymptotic": p_lrt,
}
{'S': 43.80640950151249,
 'rate_mle': 1.1413854860274195,
 'p_two_sided_exact': 0.3857960784607864,
 'lrt_stat': 0.8371053131042316,
 'p_lrt_asymptotic': 0.3602259671779372}
# B) Bayesian update: Gamma prior on λ

# Prior: λ ~ Gamma(alpha0, beta0) in (shape, rate) form
alpha0 = 2.0
beta0 = 1.0

# Data
true_rate = 1.25
n = 40
x = sample_expon_inverse(n, rate=true_rate, rng=rng)
S = x.sum()

# Posterior: λ | x ~ Gamma(alpha0+n, beta0+S)
alpha_post = alpha0 + n
beta_post = beta0 + S

lam_grid = np.linspace(0, 4, 500)
prior_pdf = stats.gamma(a=alpha0, scale=1 / beta0).pdf(lam_grid)
post_pdf = stats.gamma(a=alpha_post, scale=1 / beta_post).pdf(lam_grid)

fig = go.Figure()
fig.add_trace(go.Scatter(x=lam_grid, y=prior_pdf, mode="lines", name="prior"))
fig.add_trace(go.Scatter(x=lam_grid, y=post_pdf, mode="lines", name="posterior"))
fig.add_vline(x=true_rate, line_dash="dot", opacity=0.5)
fig.update_layout(
    title="Gamma prior → Gamma posterior for the exponential rate λ",
    xaxis_title="λ",
    yaxis_title="density",
)
fig.show()

post_mean = alpha_post / beta_post
post_mean
1.1823669274709439
# C) Poisson process simulation via exponential inter-arrival times

def simulate_poisson_process(rate: float, T: float, rng: np.random.Generator) -> np.ndarray:
    if rate <= 0:
        raise ValueError("rate must be > 0")
    if T <= 0:
        raise ValueError("T must be > 0")

    t = 0.0
    times: list[float] = []
    while True:
        t += sample_expon_inverse(1, rate=rate, rng=rng)[0]
        if t > T:
            break
        times.append(t)

    return np.asarray(times)


rate = 1.2
T = 10.0
arrival_times = simulate_poisson_process(rate=rate, T=T, rng=rng)

# Build a step plot for N(t)
t_grid = np.linspace(0, T, 400)
counts = np.searchsorted(arrival_times, t_grid, side="right")

fig = go.Figure()
fig.add_trace(go.Scatter(x=t_grid, y=counts, mode="lines", name="N(t)"))
fig.update_layout(
    title=f"One Poisson process sample path (λ={rate:g}, T={T:g})",
    xaxis_title="time t",
    yaxis_title="count N(t)",
)
fig.show()

len(arrival_times), arrival_times[:5]
(15, array([0.0376, 0.8452, 1.5469, 2.4493, 3.0343]))

11) Pitfalls#

  • Rate vs scale confusion: many texts use rate (\lambda); SciPy’s expon uses scale = 1/\lambda.

  • Location shifts: expon(loc=ℓ, ...) changes support to ([\ell,\infty)). If your model assumes support ([0,\infty)), fit with floc=0.

  • Nonnegative data: exponential likelihood is zero for negative observations. If you see negatives, your preprocessing/model is inconsistent.

  • Censoring/truncation: survival/reliability data is often right-censored; naive MLE on observed times can be biased.

  • Numerical issues:

    • use log1p(-u) for inverse transform when (u) is close to 0

    • for large (\lambda x), exp(-λx) underflows; use log-PDF (scipy.stats.expon.logpdf) when needed

  • Goodness-of-fit tests with fitted parameters: a plain KS test after fitting parameters is not distribution-free (Lilliefors-type corrections apply).

12) Summary#

  • expon is a continuous distribution on ([0,\infty)) used for memoryless waiting times.

  • PDF/CDF: (f(x)=\lambda e^{-\lambda x}), (F(x)=1-e^{-\lambda x}) for (x\ge 0).

  • Mean/variance: (\mathbb{E}[X]=1/\lambda), (\mathrm{Var}(X)=1/\lambda^2); skewness (=2), excess kurtosis (=6).

  • Inverse-CDF sampling is simple: (X=-\ln(1-U)/\lambda).

  • SciPy mapping: expon(scale=1/λ) (and optionally loc).

  • Common use cases include hypothesis tests about rates, conjugate Bayesian updates (Gamma prior), and simulating Poisson processes.